819 research outputs found

    Design and training of deep reinforcement learning agents

    Get PDF
    Deep reinforcement learning is a field of research at the intersection of reinforcement learning and deep learning. On one side, the problem that researchers address is the one of reinforcement learning: to act efficiently. A large number of algorithms were developed decades ago in this field to update value functions and policies, explore, and plan. On the other side, deep learning methods provide powerful function approximators to address the problem of representing functions such as policies, value functions, and models. The combination of ideas from these two fields offers exciting new perspectives. However, building successful deep reinforcement learning experiments is particularly difficult due to the large number of elements that must be combined and adjusted appropriately. This thesis proposes a broad overview of the organization of these elements around three main axes: agent design, environment design, and infrastructure design. Arguably, the success of deep reinforcement learning research is due to the tremendous amount of effort that went into each of them, both from a scientific and engineering perspective, and their diffusion via open source repositories. For each of these three axes, a dedicated part of the thesis describes a number of related works that were carried out during the doctoral research. The first part, devoted to the design of agents, presents two works. The first one addresses the problem of applying discrete action methods to large multidimensional action spaces. A general method called action branching is proposed, and its effectiveness is demonstrated with a novel agent, named BDQ, applied to discretized continuous action spaces. The second work deals with the problem of maximizing the utility of a single transition when learning to achieve a large number of goals. In particular, it focuses on learning to reach spatial locations in games and proposes a new method called Q-map to do so efficiently. An exploration mechanism based on this method is then used to demonstrate the effectiveness of goal-directed exploration. Elements of these works cover some of the main building blocks of agents: update methods, neural architectures, exploration strategies, replays, and hierarchy. The second part, devoted to the design of environments, also presents two works. The first one shows how various tasks and demonstrations can be combined to learn complex skill spaces that can then be reused to solve even more challenging tasks. The proposed method, called CoMic, extends previous work on motor primitives by using a single multi-clip motion capture tracking task in conjunction with complementary tasks targeting out-of-distribution movements. The second work addresses a particular type of control method vastly neglected in traditional environments but essential for animals: muscle control. An open source codebase called OstrichRL is proposed, containing a musculoskeletal model of an ostrich, an ensemble of tasks, and motion capture data. The results obtained by training a state-of-the-art agent on the proposed tasks show that controlling such a complex system is very difficult and illustrate the importance of using motion capture data. Elements of these works demonstrate the meticulous work that must go into designing environment parts such as: models, observations, rewards, terminations, resets, steps, and demonstrations. The third part, on the design of infrastructures, presents three works. The first one explains the difference between the types of time limits commonly used in reinforcement learning and why they are often treated inappropriately. In one case, tasks are time-limited by nature and a notion of time should be available to agents to maintain the Markov property of the underlying decision process. In the other case, tasks are not time-limited by nature, but time limits are used for convenience to diversify experiences. This is the most common case. It requires a distinction between time limits and environmental terminations, and bootstrapping should be performed at the end of partial episodes. The second work proposes to unify the most popular deep learning frameworks using a single library called Ivy, and provides new differentiable and framework-agnostic libraries built with it. Four such code bases are provided for gradient-based robot motion planning, mechanics, 3D vision, and differentiable continuous control environments. Finally, the third paper proposes a novel deep reinforcement learning library, called Tonic, built with simplicity and modularity in mind, to accelerate prototyping and evaluation. In particular, it contains implementations of several continuous control agents and a large-scale benchmark. Elements of these works illustrate the different components to consider when building the infrastructure for an experiment: deep learning framework, schedules, and distributed training. Added to these are the various ways to perform evaluations and analyze results for meaningful, interpretable, and reproducible deep reinforcement learning research.Open Acces

    Aspects of geodesical motion with Fisher-Rao metric: classical and quantum

    Full text link
    The purpose of this article is to exploit the geometric structure of Quantum Mechanics and of statistical manifolds to study the qualitative effect that the quantum properties have in the statistical description of a system. We show that the end points of geodesics in the classical setting coincide with the probability distributions that minimise Shannon's Entropy, i.e. with distributions of zero dispersion. In the quantum setting this happens only for particular initial conditions, which in turn correspond to classical submanifolds. This result can be interpreted as a geometric manifestation of the uncertainty principle.Comment: 15 pages, 5 figure

    Hamilton-Jacobi approach to Potential Functions in Information Geometry

    Get PDF
    The search for a potential function SS allowing to reconstruct a given metric tensor gg and a given symmetric covariant tensor TT on a manifold M\mathcal{M} is formulated as the Hamilton-Jacobi problem associated with a canonically defined Lagrangian on TMT\mathcal{M}. The connection between this problem, the geometric structure of the space of pure states of quantum mechanics, and the theory of contrast functions of classical information geometry is outlined.Comment: 16 pages. A discussion on the Kullback-Leibler divergence has been added. To appear in Journal of Mathematical Physic

    La jornada complementaria y la importancia de la capacitación de docentes de secundaria del colegio Enrique Pardo Parra mediante la implementación de las Tic en el municipio de Cota

    Get PDF
    En este documento se pretende hablar de la educación pública en Colombia, la cual ha querido avanzar, pero se encuentra con abismos que no la dejan. Las entidades públicas en ciudades y municipios empiezan a buscar soluciones para llegar a ser de un nivel semejante a la educación privada. Específicamente, se tratará el tema de la jornada escolar extendida, mecanismo que empieza a ser latente en las administraciones públicas para mejorar el desempeño académico de los estudiantes. El objetivo final es obtener un mejor desempeño académico, para que así haya mejores y más preparados profesionales que administren el país y lo lleven a ser más competitivo a nivel nacional e internacional. Por eso, a lo largo de este documento se pretende hacer un análisis de este tipo de mecanismo como lo es la jornada escolar extendida. Se realiza una búsqueda bibliográfica profunda con el fin de desarrollar los objetivos propuestos y así determinar la importancia de la educación para el desarrollo de una nación con el uso de las Tic, con el fin de realizar jornadas de capacitación para docentes. Con ella se pueden alcanzar buenos niveles de bienestar social, así como crecimiento político y económico de un país. Educar es el futuro del progreso de las personas y el avance de una sociedad integral, en la presente monografía, analiza la capacitación de los docentes por medio del uso de las tics (Videos, actividades didácticas mediante implementación de software). Habitamos en una comunidad donde nos tenemos que esforzar cada día más para poder lograr nuestros sueños, para cumplirlos es indispensable contar con una buena educación. La educación es la base de toda sociedad, con ella se aprende a argumentar y no a imponer las ideas a otros mediante la violencia. Igualmente, con ella se mejora la comunicación y el desarrollo de los problemas en nuestras sociedades. Es por esto que debemos buscar nuevas maneras de educar y que estas sean más efectivas para capacitar a las nuevas generaciones.This document intends to talk about public education in Colombia, which has wanted to move forward, but finds abysses that do not leave it. Public entities in cities and municipalities begin to look for solutions to become a level similar to private education. Specifically, the theme of the extended school day will be discussed, a mechanism that begins to be latent in public administrations to improve the academic performance of students. The ultimate goal is to obtain a better academic performance, so that there are better and more prepared professionals who manage the country and lead it to be more competitive nationally and internationally. Therefore, throughout this document we intend to analyze this type of mechanism, such as the extended school day. A deep bibliographic search is carried out in order to develop the proposed objectives and thus determine the importance of education for the development of a nation with the use of ICT, in order to conduct training sessions for teachers. With it you can reach good levels of social welfare, as well as political and economic growth of a country. Educating is the future of the progress of people and the advancement of a comprehensive society, in the present monograph, analyzes the training of teachers through the use of tics (videos, didactic activities through software implementation). We live in a community where we have to strive every day to achieve our dreams, to meet them is essential to have a good education. Education is the basis of every society, with it you learn to argue and not to impose ideas on others through violence. Likewise, it improves communication and the development of problems in our societies. That is why we must look for new ways to educate and that these are more effective to train the new generations
    corecore